An analysis of transcription consistency in spontaneous speech from the buckeye corpus

نویسندگان

William D. Raymond

Mark A. Pitt

Keith Johnson

Elizabeth Hume

Matthew J. Makashay

Robin Dautricourt

Craig Hilts

چکیده

We present a preliminary analysis of transcriber consistency in labeling and segmentation of words and phones in the Buckeye corpus of spontaneous, informal speech. We find that pairwise inter-transcriber agreement on exact phone label match was 76%, and segmentation agreement within 20% of phone pair length was 75%, though longer phones are more consistently segmented than shorter phones. Patterns of consistency variation in labeling are observed as a function of phonetic categories that are similar to patterns reported for read speech. More agreement is seen on consonants than on vowels, and on fricatives and labials than on other consonant classes. In general, we find that shorter, more reduced words and phones result in more transcriber disagreement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus

متن کامل

The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability

This paper describes the Buckeye corpus of spontaneous American English speech, a 307,000-word corpus containing the speech of 40 talkers from central Ohio, USA. The method used to elicit and record the speech is described, followed by a description of the protocol that was developed to phonemically label what talkers said. The results of a test of labeling consistency are then presented. The c...

متن کامل

Understanding VOT Variation in Spontaneous Speech

This paper reports a corpus study on the variation of VOT in voiceless stops in spontaneous speech. Two speakers’ data from the Buckeye corpus are used: one is an older female speaker with a low speaking rate while the other is a younger male speaker with an extremely high speaking rate. Linear regression analysis shows that place of articulation, word frequency, phonetic context, speech rate a...

متن کامل

The buckeye corpus of speech: updates and enhancements

This paper describes recent progress in the development of the Buckeye Corpus of Speech, a phonetically labeled corpus of conversational American English speech, first described in [1]. With the publication of the second phase of transcription, the corpus has nearly doubled in size from the first release. We briefly give an overview of the corpus, report on additional studies of inter-labeler a...

متن کامل

Improving transcription agreement of non-native English speech corpus transcribed by non-natives

This paper proposes an economical and effective phonetic transcription method for dealing with a large amount of nonnative English speech corpus. The method provides a consistent transcription agreement, although the corpus is transcribed by non-natives. To minimize the possibility of confusion in transcription process, forced aligned phone sequences and a set of possible mispronunciation candi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

An analysis of transcription consistency in spontaneous speech from the buckeye corpus

نویسندگان

چکیده

منابع مشابه

An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus

The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability

Understanding VOT Variation in Spontaneous Speech

The buckeye corpus of speech: updates and enhancements

Improving transcription agreement of non-native English speech corpus transcribed by non-natives

عنوان ژورنال:

اشتراک گذاری